Loose Phrase String Kernels

نویسنده

  • Janez Brank
چکیده

When representing textual documents by feature vectors for the purposes of further processing (e.g. for categorization, clustering, or visualization), one possible representation is based on “loose phrases” (also known as “proximity features”). This is a generalization of n-grams: a loose phrase is considered to appear in a document if all the words from the phrase occur sufficiently close to each other. We describe a kernel that corresponds to the dot product of documents under a loose phrase representation. This kernel can be plugged into any kernel method to deal with documents in the loose phrase representation instead of the bag of words representation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Leaf Path Projection View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

We present a novel representation of parse trees as lists of paths (leaf projection paths) from leaves to the top level of the tree. This representation allows us to achieve significantly higher accuracy in the task of HPSG parse selection than standard models, and makes the application of string kernels natural. We define tree kernels via string kernels on projection paths and explore their pe...

متن کامل

Scalable Algorithms for String Kernels with Inexact Matching

We present a new family of linear time algorithms for string comparison with mismatches under the string kernels framework. Based on sufficient statistics, our algorithms improve theoretical complexity bounds of existing approaches while scaling well in sequence alphabet size, the number of allowed mismatches and the size of the dataset. In particular, on large alphabets and under loose mismatc...

متن کامل

String Kernels

This paper provides an overview of string kernels. String kernels compare text documents by the substrings they contain. Because of high computational complexity, methods for approximating string kernels are shown. Several extensions for string kernels are also presented. Finally string kernels are compared to BOW.

متن کامل

Position-Aware String Kernels with Weighted Shifts and a General Framework to Apply String Kernels to Other Structured Data

In combination with efficient kernel-base learning machines such as Support Vector Machine (SVM), string kernels have proven to be significantly effective in a wide range of research areas (e.g. bioinformatics, text analysis, voice analysis). Many of the string kernels proposed so far take advantage of simpler kernels such as trivial comparison of characters and/or substrings, and are classifie...

متن کامل

A Randomized String Kernel and Its Application to RNA Interference

String kernels directly model sequence similarities without the necessity of extracting numerical features in a vector space. Since they better capture complex traits in the sequences, string kernels often achieve better prediction performance. RNA interference is an important biological mechanism with many therapeutical applications, where strings can be used to represent target messenger RNAs...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006